Lessons learned? Chemical plant safety since Bhopal.

نویسنده

  • Ernie Hood
چکیده

Comprehensive mapping of transcription factor binding sites is essential in postgenomic biology. For this, we propose a mining approach combining noisy data from ChIP (chromatin immunoprecipitation)-chip experiments with known binding site patterns. Our method (BoCaTFBS) uses boosted cascades of classifiers for optimum efficiency, in which components are alternating decision trees; it exploits interpositional correlations; and it explicitly integrates massive negative information from ChIP-chip experiments. We applied BoCaTFBS within the ENCODE project and showed that it outperforms many traditional binding site identification methods (for instance, profiles). Background The diverse phenotypes from an invariant set of genes are controlled by a biochemical process that regulates gene activity [1]. Transcription is central to the regulation mechanisms in the process of gene expression. It is regulated by interplay between transcription factors and their binding sites. Understanding the targets that are regulated by transcription factors in the human genome is highly desirable in the postgenomic era. Some experimental methods, such as footprinting [2] and SELEX (systematic evolution of ligands by exponential evolution) [3], exist for identifying transcription factor binding sites (TFBSs). Chromatin immunoprecipitation (ChIP)-chip technology was introduced originally to identify genomic binding regions of transcription factors in yeast [4-6]. It was later applied to the human genome [7]. There have been many applications to single chromosomes in human. ChIP-chip technology, otherwise known as microarray-based readout of chromatin immunoprecipitation assays, is a procedure for mapping in vivo targets of transcription factors by ChIP with antibodies to a transcription factor of interest in order to isolate protein-bound DNA, followed by probing a microarray containing genomic DNA sequences with the immunoprecipitated DNA. Snyder and colleagues [8] mapped nuclear factor (NF)-κB binding sites in human chromosome 22 in a high-throughput manner. A number of other publications have similarly mapped the sites of other transcription factors [9,10]. ChIPchip technology has been applied to the human genome for a variety of different factors [11]. Additionally, there are related techniques such as ChIP-SAGE (serial analysis of gene expression) [12-14]. Unfortunately, the ChIP-chip technique and its variants are still time consuming, sensitive to the Published: 1 November 2006 Genome Biology 2006, 7:R102 (doi:10.1186/gb-2006-7-11-r102) Received: 20 June 2006 Revised: 29 August 2006 Accepted: 1 November 2006 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2006/7/11/R102 Genome Biology 2006, 7:R102 R102.2 Genome Biology 2006, Volume 7, Issue 11, Article R102 Wang et al. http://genomebiology.com/2006/7/11/R102 physiologic perturbation, and expensive to use for screening TFBSs in the whole genome. Many computational methods for identifying TFBSs have been proposed in the literature [15-17]. Some of the methods attempt to discover potential binding sites for any transcription factor given only a collection of unaligned promoter regions for suspected coregulated genes (for example MEME [18], AlignAce [Gibbs sampling] [19], and BioProspector [20]). Other methods attempt to predict TFBSs for a specific transcription factor given a collection of known binding sites already available [15,21-23]. Our proposed method in this paper is relevant to the latter problem. Consensus sequences or regular expressions are still frequently used to depict the binding specificities of transcription factors. They represent a somewhat simplistic view of the binding sequence and only work well in highly conserved motifs because they do not contain useful information about the relative likelihood of observing the alternate nucleotides at different positions of a TFBS. However, variability is believed to have a critical impact on the fine regulation of gene expression. This makes it very difficult to identify all potential binding sites without the aid of computational techniques. Another more common method is the profile method, also known as positional specific scoring matrix (PSSM) or position weight matrix [21]. The largest and most commonly used collection is the TRANSFAC database, which catalogs transcription factors, their known binding sites, and the corresponding profiles (PSSMs) [23]. In addition, a number of tools such as MATRIX SEARCH [24], MatInd/MatInspector [25], Mapper [26], SIGNAL SCAN [27], and rVISTA [28], have been developed to enable the user to search an input sequence for matches to a PSSM or a library of PSSMs. However, PSSMs treat each position of the binding sites as independent from each other. They cannot model the interactions between positions within DNA-binding sites, nor can they model explicit coevolution of related positions within binding sites. PSSMs normally describe only a fixed length motif, whereas many DNA-binding proteins can bind to variable length sites. Finally, it is not always feasible to construct a multiple alignment of the binding sites necessary to build a PSSM. Graphical models were also introduced to represent the dependences between positions [29,30]. In particular, Markov chains were utilized to statistically model the number and relative locations of TFBSs within a sequence. Although the hidden Markov model allows dependencies among positions to be encoded in the state transition probabilities [29], not all dependencies are well treated systematically. An optimized Markov chain algorithm was introduced to integrate pair-wise correlation into Markov models to predict a particular transcription factor's binding sites (hepatocyte nuclear factor 4α) [22]. An alternative approach, phylogenetic footprinting, identifies functional regulation elements from noncoding DNA sequence conservation between related species [31-33]. It has successfully been applied to single genome loci, but this method is limited by the short length of functional binding sites and the large number of insertion/deletion events within regulatory regions. There are also other methods, such as maximal dependence decomposition [34] and the nonparametric method [35]. Singh and coworkers [15] evaluated traditional TFBS prediction methods and introduced perposition information content and local pair-wise nucleotide dependencies to four major traditional methods (for further detail, see Materials and methods, below). Their benchmark results on Escherichia coli transcription factors indicated that the best results were achieved by incorporating both perposition information content and local pair-wise correlation; however, all of the conventional methods of TFBS prediction generate a high false-positive rate when applied to the genome [36]. Local pair-wise correlation within TFBSs was discovered in some recent experimental and theoretical research. Microarray binding experiments indicated that nucleotides of TFBSs exert interdependent effects on the binding affinities of transcription factors [37]. Also, Kwiatkowski and coworkers [38] showed that there are nucleotide positions in the TFBSs that interact with each other by using principle coordinate analysis to predict the effects of single nucleotide polymorphisms within regulatory sequences on DNA-protein interactions. Finding TFBSs is particularly challenging in the human genome in comparison with simpler organisms such as yeast and fly. TFBSs can occur downstream, upstream, or possibly in the introns of the genes they regulate [8-10]. Moreover, the human genome is about 200 times larger than the yeast genome, and approximately 99% does not encode proteins. Thus, it can be very difficult to find TFBSs in noncoding sequences using relatively simple computational tools. In this postgenomic era, comprehensive high-throughput experiments (such as ChIP-chip) or gene annotation provides a huge amount of information about sites that are not bound by a factor, as well as some information about the sites that are bound. In fact, such techniques provide better information about nonbinding sites than about binding sites because the resolution of the binding sites is limited by the size of probes in the ChIP-chip experiments and there are only limited binding regions detected, whereas there is a very large amount of information on sites not bound. Moreover, the ENCyclopedia Of DNA Elements (ENCODE) Project [39] is expected to produce a surge in the availability of massive ChIP-chip datasets. Genome Biology 2006, 7:R102 http://genomebiology.com/2006/7/11/R102 Genome Biology 2006, Volume 7, Issue 11, Article R102 Wang et al. R102.3

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Uncertain Promise of Law: Lessons from Bhopal

This paper describes the course of the litigation following the Bhopal disaster. It begins with a brief description of the various failures in risk assessment and management that gave rise to the hazardous conditions in Bhopal, and then describes in more detail the resulting legal proceedings. Specifying a number of modest criteria against which the success of the litigation can be measured, th...

متن کامل

Epidemiologic Methods Lessons Learned from Environmental Public Health Disasters: Chernobyl, the World Trade Center, Bhopal, and Graniteville, South Carolina

BACKGROUND Environmental public health disasters involving hazardous contaminants may have devastating effects. While much is known about their immediate devastation, far less is known about long-term impacts of these disasters. Extensive latent and chronic long-term public health effects may occur. Careful evaluation of contaminant exposures and long-term health outcomes within the constraints...

متن کامل

Chemical process safety at a crossroads.

will mark the 20th anniversary of the worst industrial accident in history, the chemical plant disaster in Bhopal, India, that killed thousands of people and injured tens of thousands more. Along with other safety professionals from around the world, I will be traveling to India this fall to reflect on what has changed and what we still must do to better protect the lives of workers and the pub...

متن کامل

Hospital Management in Infectious Disease Outbreak: Lessons Learned from COVID-19

Background: Biological events including epidemics, pandemics, emerging, and reemerging infectious diseases have significant adverse consequences on health. The hospitals have a major role in the management of outbreaks and mitigation of effects. During pandemics health systems especially, hospitals may be affected. Methods: Therefore, the current study aimed to collect and analyze lessons lea...

متن کامل

Factors Affecting Medication Errors from Nurses' Perspective: Lessons Learned

Introduction: Medical errors are among the most threatening faults against patient’s safety in all countries. The most frequent medical errors are medication errors which can lead to serious effects and even death in patients. Therefore, this study aimed to explain factors affecting medication eroors from the viewpoints of nurses in order to present strategies to reduce these errors. Methods:...

متن کامل

The epidemiology of disasters and adverse reproductive outcomes: lessons learned.

A disaster has been defined as a disruption of human ecology that exceeds the capacity of the community to function normally. Little is known about the adverse effects of natural disasters on reproductive outcomes. Important lessons can be derived from several disasters caused by human factors, such as the Minamata Bay disaster. Adverse reproductive outcomes include infertility, early pregnancy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Environmental Health Perspectives

دوره 112  شماره 

صفحات  -

تاریخ انتشار 2004